Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-37252262

RESUMEN

Multiple waves of COVID-19 have significantly impacted the emotional well-being of all, but many were subject to additional risks associated with forced regulations. The objective of this research was to assess the immediate emotional impact, expressed by Canadian Twitter users, and to estimate the linear relationship, with the vicissitudes of COVID caseloads, using ARIMA time-series regression. We developed two Artificial Intelligence-based algorithms to extract tweets using 18 semantic terms related to social confinement and locked down and then geocoded them to tag Canadian provinces. Tweets (n = 64,732) were classified as positive, negative, and neutral sentiments using a word-based Emotion Lexicon. Our results indicated: that Tweeters were expressing a higher daily percentage of negative sentiments representing, negative anticipation (30.1%), fear (28.1%), and anger (25.3%), than positive sentiments comprising positive anticipation (43.7%), trust (41.4%), and joy (14.9%), and neutral sentiments with mostly no emotions, when hash-tagged social confinement and locked down. In most provinces, negative sentiments took on average two to three days after caseloads increase to emerge, whereas positive sentiments took a slightly longer period of six to seven days to submerge. As daily caseloads increase, negative sentiment percentage increases in Manitoba (by 68% for 100 caseloads increase) and Atlantic Canada (by 89% with 100 caseloads increase) in wave 1(with 30% variations explained), while other provinces showed resilience. The opposite was noted in the positive sentiments. The daily percentage of emotional expression variations explained by daily caseloads in wave one were 30% for negative, 42% for neutral, and 2.1% for positive indicating that the emotional impact is multifactorial. These provincial-level impact differences with varying latency periods should be considered when planning geographically targeted, time-sensitive, confinement-related psychological health promotion efforts. Artificial Intelligence-based Geo-coded sentiment analysis of Twitter data opens possibilities for targeted rapid emotion sentiment detection opportunities.

2.
Neural Netw ; 159: 25-33, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36525915

RESUMEN

Recurrent Neural Network (RNN) models have been applied in different domains, producing high accuracies on time-dependent data. However, RNNs have long suffered from exploding gradients during training, mainly due to their recurrent process. In this context, we propose a variant of the scalar gated FastRNN architecture, called Scalar Gated Orthogonal Recurrent Neural Networks (SGORNN). SGORNN utilizes orthogonal matrices at the recurrent step. Our experiments evaluate SGORNN using two recently proposed orthogonal parametrizations for the recurrent weights of an RNN. We present a constraint on the scalar gates of SGORNN, which is easily enforced at training time to provide a probabilistic generalization gap which grows linearly with the length of sequences processed. Next, we provide bounds on the gradients of SGORNN to show the impossibility of exponentially exploding gradients through time. Our experimental results on the addition problem confirm that our combination of orthogonal and scalar gated RNNs are able to outperform other orthogonal RNNs and LSTM on long sequences. We further evaluate SGORNN on the HAR-2 classification task, where it improves upon the accuracy of several models using far fewer parameters than standard RNNs. Finally, we evaluate SGORNN on the Penn Treebank word-level language modeling task, where it again outperforms its related architectures and shows comparable performance to LSTM using far less parameters. Overall, SGORNN shows higher representation capacity than the other orthogonal RNNs tested, suffers from less overfitting than other models in our experiments, benefits from a decrease in parameter count, and alleviates exploding gradients during backpropagation through time.


Asunto(s)
Lenguaje , Redes Neurales de la Computación , Generalización Psicológica
3.
Sensors (Basel) ; 22(16)2022 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-36015824

RESUMEN

Automatic Identification System (AIS) messages are useful for tracking vessel activity across oceans worldwide using radio links and satellite transceivers. Such data play a significant role in tracking vessel activity and mapping mobility patterns such as those found during fishing activities. Accordingly, this paper proposes a geometric-driven semi-supervised approach for fishing activity detection from AIS data. Through the proposed methodology, it is shown how to explore the information included in the messages to extract features describing the geometry of the vessel route. To this end, we leverage the unsupervised nature of cluster analysis to label the trajectory geometry, highlighting changes in the vessel's moving pattern, which tends to indicate fishing activity. The labels obtained by the proposed unsupervised approach are used to detect fishing activities, which we approach as a time-series classification task. We propose a solution using recurrent neural networks on AIS data streams with roughly 87% of the overall F-score on the whole trajectories of 50 different unseen fishing vessels. Such results are accompanied by a broad benchmark study assessing the performance of different Recurrent Neural Network (RNN) architectures. In conclusion, this work contributes by proposing a thorough process that includes data preparation, labeling, data modeling, and model validation. Therefore, we present a novel solution for mobility pattern detection that relies upon unfolding the geometry observed in the trajectory.


Asunto(s)
Caza , Redes Neurales de la Computación , Análisis por Conglomerados , Océanos y Mares
4.
Bioinformatics ; 38(11): 3051-3061, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35536192

RESUMEN

MOTIVATION: There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein-protein interactions and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations. RESULTS: We introduce deepSimDEF, a deep learning method to automatically learn FS estimation of gene pairs given a set of genes and their GO annotations. deepSimDEF's key novelty is its ability to learn low-dimensional embedding vector representations of GO terms and gene products and then calculate FS using these learned vectors. We show that deepSimDEF can predict the FS of new genes using their annotations: it outperformed all other FS measures by >5-10% on yeast and human reference datasets on protein-protein interactions, gene co-expression and sequence homology tasks. Thus, deepSimDEF offers a powerful and adaptable deep neural architecture that can benefit a wide range of problems in genomics and proteomics, and its architecture is flexible enough to support its extension to any organism. AVAILABILITY AND IMPLEMENTATION: Source code and data are available at https://github.com/ahmadpgh/deepSimDEF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Proteínas , Humanos , Ontología de Genes , Biología Computacional/métodos , Anotación de Secuencia Molecular , Programas Informáticos , Saccharomyces cerevisiae , ARN
5.
J Healthc Inform Res ; 6(2): 174-207, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35194569

RESUMEN

The COVID-19 pandemic has affected people's lives in many ways. Social media data can reveal public perceptions and experience with respect to the pandemic, and also reveal factors that hamper or support efforts to curb global spread of the disease. In this paper, we analyzed COVID-19-related comments collected from six social media platforms using natural language processing (NLP) techniques. We identified relevant opinionated keyphrases and their respective sentiment polarity (negative or positive) from over 1 million randomly selected comments, and then categorized them into broader themes using thematic analysis. Our results uncover 34 negative themes out of which 17 are economic, socio-political, educational, and political issues. Twenty (20) positive themes were also identified. We discuss the negative issues and suggest interventions to tackle them based on the positive themes and research evidence.

6.
J Big Data ; 8(1): 96, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34760434

RESUMEN

Nowadays, urban data such as demographics, infrastructure, and criminal records are becoming more accessible to researchers. This has led to improvements in quantitative crime research for predicting future crime occurrence by identifying factors and knowledge from instances that contribute to criminal activities. While crime distribution in the geographic space is asymmetric, there are often analog, implicit criminogenic factors hidden in the data. And, since the data are not as available or comprehensive, especially for smaller cities, it is challenging to build a uniform framework for all geographic regions. This paper addresses the crime prediction task from a cross-domain perspective to tackle the data insufficiency problem in a small city. We create a uniform outline for Halifax, Nova Scotia, one of Canada's geographic regions, by adapting and learning knowledge from two different domains, Toronto and Vancouver, which belong to different but related distributions with Halifax. For transferring knowledge among source and target domains, we propose applying instance-based transfer learning settings. Each setting is directed to learning knowledge based on a seasonal perspective with cross-domain data fusion. We choose ensemble learning methods for model building as it has generalization capabilities over new data. We evaluate the classification performance for both single and multi-domain representations and compare the results with baseline models. Our findings exhibit the satisfactory performance of our proposed data-driven approach by integrating multiple sources of data.

7.
Bone Jt Open ; 2(8): 679-684, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34409843

RESUMEN

AIMS: In countries with social healthcare systems, such as Canada, patients may experience long wait times and a decline in their health status prior to their operation. The aim of this study is to explore the association between long preoperative wait times (WT) and acute hospital length of stay (LoS) for primary arthroplasty of the knee and hip. METHODS: The study population was obtained from the provincial Patient Access Registry Nova Scotia (PARNS) and the Canadian national hospital Discharge Access Database (DAD). We included primary total knee and hip arthroplasties (TKA, THA) between 2011 and 2017. Patients waiting longer than the recommended 180 days Canadian national standard were compared to patients waiting equal or less than the standard WT. The primary outcome measure was acute LoS postoperatively. Secondarily, patient demographics, comorbidities, and perioperative parameters were correlated with LoS with multivariate regression. RESULTS: A total of 11,833 TKAs and 6,627 THAs were included in the study. Mean WT for TKA was 348 days (1 to 3,605) with mean LoS of 3.6 days (1 to 98). Mean WT for THA was 267 days (1 to 2,015) with mean LoS of 4.0 days (1 to 143). There was a significant increase in mean LoS for TKA waiting longer than 180 days (2.5% (SE 1.1); p = 0.028). There was no significant association for THA. Age, sex, surgical year, admittance from home, rural residence, household income, hospital facility, the need for blood transfusion, and comorbidities were all found to influence LoS. CONCLUSION: Surgical WT longer than 180 days resulted in increased acute LoS for primary TKA. Meeting a shorter WT target may be cost-saving in a social healthcare system by having shorter LoS. Cite this article: Bone Jt Open 2021;2(8):679-684.

8.
Sensors (Basel) ; 21(12)2021 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-34207959

RESUMEN

Corrosion identification and repair is a vital task in aircraft maintenance to ensure continued structural integrity. Regarding fuselage lap joints, typically, visual inspections are followed by non-destructive methodologies, which are time-consuming. The visual inspection of large areas suffers not only from subjectivity but also from the variable probability of corrosion detection, which is aggravated by the multiple layers used in fuselage construction. In this paper, we propose a methodology for automatic image-based corrosion detection of aircraft structures using deep neural networks. For machine learning, we use a dataset that consists of D-Sight Aircraft Inspection System (DAIS) images from different lap joints of Boeing and Airbus aircrafts. We also employ transfer learning to overcome the shortage of aircraft corrosion images. With precision of over 93%, we demonstrate that our approach detects corrosion with a precision comparable to that of trained operators, aiding to reduce the uncertainties related to operator fatigue or inadequate training. Our results indicate that our methodology can support specialists and engineers in corrosion monitoring in the aerospace industry, potentially contributing to the automation of condition-based maintenance protocols.


Asunto(s)
Aeronaves , Inteligencia Artificial , Automatización , Corrosión , Redes Neurales de la Computación
9.
J Acoust Soc Am ; 149(4): 2520, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33940913

RESUMEN

Passive acoustic monitoring (PAM) is a useful technique for monitoring marine mammals. However, the quantity of data collected through PAM systems makes automated algorithms for detecting and classifying sounds essential. Deep learning algorithms have shown great promise in recent years, but their performance is limited by the lack of sufficient amounts of annotated data for training the algorithms. This work investigates the benefit of augmenting training datasets with synthetically generated samples when training a deep neural network for the classification of North Atlantic right whale (Eubalaena glacialis) upcalls. We apply two recently proposed augmentation techniques, SpecAugment and Mixup, and show that they improve the performance of our model considerably. The precision is increased from 86% to 90%, while the recall is increased from 88% to 93%. Finally, we demonstrate that these two methods yield a significant improvement in performance in a scenario of data scarcity, where few training samples are available. This demonstrates that data augmentation can reduce the annotation effort required to achieve a desirable performance threshold.


Asunto(s)
Sonido , Ballenas , Algoritmos , Animales , Océano Atlántico , Redes Neurales de la Computación
10.
Artículo en Inglés | MEDLINE | ID: mdl-33905327

RESUMEN

Time-series forecasting is one of the most active research topics in artificial intelligence. A still open gap in that literature is that statistical and ensemble learning approaches systematically present lower predictive performance than deep learning methods. They generally disregard the data sequence aspect entangled with multivariate data represented in more than one time series. Conversely, this work presents a novel neural network architecture for time-series forecasting that combines the power of graph evolution with deep recurrent learning on distinct data distributions; we named our method Recurrent Graph Evolution Neural Network (ReGENN). The idea is to infer multiple multivariate relationships between co-occurring time-series by assuming that the temporal data depends not only on inner variables and intra-temporal relationships (i.e., observations from itself) but also on outer variables and inter-temporal relationships (i.e., observations from other-selves). An extensive set of experiments was conducted comparing ReGENN with dozens of ensemble methods and classical statistical ones, showing sound improvement of up to 64.87% over the competing algorithms. Furthermore, we present an analysis of the intermediate weights arising from ReGENN, showing that by looking at inter and intra-temporal relationships simultaneously, time-series forecasting is majorly improved if paying attention to how multiple multivariate data synchronously evolve.

11.
Ethics Inf Technol ; 23(Suppl 1): 1-6, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33551673

RESUMEN

The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the "phase 2" of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates-if and when they want and for specific aims-with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.

12.
Netw Neurosci ; 4(3): 528-555, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32885114

RESUMEN

Adherence determines the success and benefits of mental training (e.g., meditation) programs. It is unclear why some participants engage more actively in programs for mental training than others. Understanding neurobiological factors that predict adherence is necessary for understanding elements of learning and to inform better designs for new learning regimens. Clustering patterns in brain networks have been suggested to predict learning performance, but it is unclear whether these patterns contribute to motivational aspects of learning such as adherence. This study tests whether configurations of brain connections in resting-state fMRI scans can be used to predict adherence to two programs: meditation and creative writing. Results indicate that greater system segregation and clustering predict the number of practice sessions and class participation in both programs at a wide range of network thresholds (corrected p value < 0.05). At a local level, regions in subcortical circuitry such as striatum and accumbens predicted adherence in all subjects. Furthermore, there were also some important distinctions between groups: Adherence to meditation was predicted by connectivity within local network of the anterior insula and default mode network; and in the writing program, adherence was predicted by network neighborhood of frontal and temporal regions. Four machine learning methods were applied to test the robustness of the brain metric for classifying individual capacity for adherence and yielded reasonable accuracy. Overall, these findings underscore the fact that adherence and the ability to perform prescribed exercises is associated with organizational patterns of brain connectivity.

13.
J Acoust Soc Am ; 147(4): 2636, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32359246

RESUMEN

Passive acoustics provides a powerful tool for monitoring the endangered North Atlantic right whale (Eubalaena glacialis), but robust detection algorithms are needed to handle diverse and variable acoustic conditions and differences in recording techniques and equipment. This paper investigates the potential of deep neural networks (DNNs) for addressing this need. ResNet, an architecture commonly used for image recognition, was trained to recognize the time-frequency representation of the characteristic North Atlantic right whale upcall. The network was trained on several thousand examples recorded at various locations in the Gulf of St. Lawrence in 2018 and 2019, using different equipment and deployment techniques. Used as a detection algorithm on fifty 30-min recordings from the years 2015-2017 containing over one thousand upcalls, the network achieved recalls up to 80% while maintaining a precision of 90%. Importantly, the performance of the network improved as more variance was introduced into the training dataset, whereas the opposite trend was observed using a conventional linear discriminant analysis approach. This study demonstrates that DNNs can be trained to identify North Atlantic right whale upcalls under diverse and variable conditions with a performance that compares favorably to that of existing algorithms.


Asunto(s)
Acústica , Ballenas , Algoritmos , Animales , Océano Atlántico , Análisis Discriminante , Redes Neurales de la Computación
14.
J Am Med Inform Assoc ; 26(5): 438-446, 2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-30811548

RESUMEN

OBJECTIVE: In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. MATERIALS AND METHODS: Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. RESULTS: We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. CONCLUSIONS: Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.


Asunto(s)
Minería de Datos/métodos , Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Vocabulario Controlado , Algoritmos , Ontologías Biológicas , Conjuntos de Datos como Asunto , Medical Subject Headings , Systematized Nomenclature of Medicine , Unified Medical Language System
15.
IEEE Comput Graph Appl ; 37(5): 28-39, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28945577

RESUMEN

The increasing availability and use of positioning devices has resulted in large volumes of trajectory data. However, semantic annotations for such data are typically added by domain experts, which is a time-consuming task. Machine-learning algorithms can help infer semantic annotations from trajectory data by learning from sets of labeled data. Specifically, active learning approaches can minimize the set of trajectories to be annotated while preserving good performance measures. The ANALYTiC web-based interactive tool visually guides users through this annotation process.

16.
PLoS One ; 11(9): e0163760, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27657928

RESUMEN

[This corrects the article DOI: 10.1371/journal.pone.0158248.].

17.
PLoS One ; 11(7): e0158248, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27367425

RESUMEN

A key challenge in contemporary ecology and conservation is the accurate tracking of the spatial distribution of various human impacts, such as fishing. While coastal fisheries in national waters are closely monitored in some countries, existing maps of fishing effort elsewhere are fraught with uncertainty, especially in remote areas and the High Seas. Better understanding of the behavior of the global fishing fleets is required in order to prioritize and enforce fisheries management and conservation measures worldwide. Satellite-based Automatic Information Systems (S-AIS) are now commonly installed on most ocean-going vessels and have been proposed as a novel tool to explore the movements of fishing fleets in near real time. Here we present approaches to identify fishing activity from S-AIS data for three dominant fishing gear types: trawl, longline and purse seine. Using a large dataset containing worldwide fishing vessel tracks from 2011-2015, we developed three methods to detect and map fishing activities: for trawlers we produced a Hidden Markov Model (HMM) using vessel speed as observation variable. For longliners we have designed a Data Mining (DM) approach using an algorithm inspired from studies on animal movement. For purse seiners a multi-layered filtering strategy based on vessel speed and operation time was implemented. Validation against expert-labeled datasets showed average detection accuracies of 83% for trawler and longliner, and 97% for purse seiner. Our study represents the first comprehensive approach to detect and identify potential fishing behavior for three major gear types operating on a global scale. We hope that this work will enable new efforts to assess the spatial and temporal distribution of global fishing effort and make global fisheries activities transparent to ocean scientists, managers and the public.


Asunto(s)
Minería de Datos/métodos , Explotaciones Pesqueras/estadística & datos numéricos , Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas/métodos , Nave Espacial , Animales
18.
Bioinformatics ; 32(9): 1380-7, 2016 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-26708333

RESUMEN

MOTIVATION: Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. RESULTS: We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. AVAILABILITY AND IMPLEMENTATION: Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Ontología de Genes , Algoritmos , Animales , Humanos , Proteínas , Semántica
19.
J Med Internet Res ; 15(10): e215, 2013 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-24091380

RESUMEN

BACKGROUND: Participants in medical forums often reveal personal health information about themselves in their online postings. To feel comfortable revealing sensitive personal health information, some participants may hide their identity by posting anonymously. They can do this by using fake identities, nicknames, or pseudonyms that cannot readily be traced back to them. However, individual writing styles have unique features and it may be possible to determine the true identity of an anonymous user through author attribution analysis. Although there has been previous work on the authorship attribution problem, there has been a dearth of research on automated authorship attribution on medical forums. The focus of the paper is to demonstrate that character-based author attribution works better than word-based methods in medical forums. OBJECTIVE: The goal was to build a system that accurately attributes authorship of messages posted on medical forums. The Authorship Attributor system uses text analysis techniques to crawl medical forums and automatically correlate messages written by the same authors. Authorship Attributor processes unstructured texts regardless of the document type, context, and content. METHODS: The messages were labeled by nicknames of the forum participants. We evaluated the system's performance through its accuracy on 6000 messages gathered from 2 medical forums on an in vitro fertilization (IVF) support website. RESULTS: Given 2 lists of candidate authors (30 and 50 candidates, respectively), we obtained an F score accuracy in detecting authors of 75% to 80% on messages containing 100 to 150 words on average, and 97.9% on longer messages containing at least 300 words. CONCLUSIONS: Authorship can be successfully detected in short free-form messages posted on medical forums. This raises a concern about the meaningfulness of anonymous posting on such medical forums. Authorship attribution tools can be used to warn consumers wishing to post anonymously about the likelihood of their identity being determined.


Asunto(s)
Autoria , Confidencialidad , Internet , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...